Burr Distribution (burr, Burr Type III / Dagum)#
The Burr (Type III) distribution (called the Dagum distribution in economics) is a flexible family on positive real values with polynomial (heavy) tails.
It is a good default when:
your data are strictly positive (income, sizes, waiting times, lifetimes)
the right tail can be much heavier than Lognormal/Gamma
you want a model with interpretable tail behavior and an easy sampler
Learning goals#
By the end you should be able to:
write the PDF/CDF/quantile function and interpret the parameters
compute moments (and know when they do not exist)
derive expectation/variance and the likelihood
sample from
burrusing inverse-transform sampling (NumPy-only)fit and use the distribution via
scipy.stats.burr
Notation#
Shape parameters: \(c > 0\), \(d > 0\)
Random variable: \(X \sim \mathrm{BurrIII}(c, d)\)
Standard support: \(x > 0\)
Table of contents#
Title & Classification
Intuition & Motivation
Formal Definition
Moments & Properties
Parameter Interpretation
Derivations
Sampling & Simulation
Visualization
SciPy Integration
Statistical Use Cases
Pitfalls
Summary
import numpy as np
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
import scipy
from scipy import stats
from scipy.special import gammaln, psi, logsumexp
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)
print("numpy:", np.__version__)
print("scipy:", scipy.__version__)
print("plotly:", plotly.__version__)
numpy: 1.26.2
scipy: 1.15.0
plotly: 6.5.2
1) Title & Classification#
Name:
burr(Burr Type III; also known as the Dagum distribution)Type: continuous
Standard support: \(x > 0\) (SciPy defines the PDF for \(x \ge 0\); depending on parameters, the density can diverge as \(x \to 0^+\))
Parameter space (standard form): \(c > 0\), \(d > 0\)
Location/scale form (SciPy): \(X = \mathrm{loc} + \mathrm{scale}\cdot Y\) with \(Y \sim \mathrm{BurrIII}(c, d)\)
Support becomes \(x > \mathrm{loc}\)
\(\mathrm{scale} > 0\)
2) Intuition & Motivation#
What it models#
The Burr Type III/Dagum family is a positive, right-skewed, heavy-tailed model. It can represent distributions where:
very small values are possible (depending on \(c d\))
extremely large values occasionally occur
the right tail decays like a power law
Typical real-world use cases#
Income/wealth modeling (Dagum distribution is common in economics)
Insurance claim sizes and other positive heavy-tailed costs
Reliability / lifetime modeling when failures can occur very late
Hydrology and environmental quantities with occasional extremes
Relations to other distributions#
Log-logistic (Fisk): when \(d = 1\), $\(F(x) = \frac{1}{1 + x^{-c}}\)\( and \)\log X\( is logistic with scale \)1/c$.
Reciprocal relationship to Burr XII: if \(X \sim \mathrm{BurrIII}(c,d)\), then \(1/X \sim \mathrm{BurrXII}(c,d)\). SciPy provides both
scipy.stats.burr(Type III/Dagum) andscipy.stats.burr12(Type XII).Key transformation: \(T = X^{-c}\) follows a Lomax (Pareto II) distribution with shape \(d\). This turns many calculations into Beta/Gamma-function identities.
3) Formal Definition#
CDF#
In the standard (unshifted, unit-scale) parameterization, for \(x>0\):
PDF#
Differentiating the CDF gives the density:
Quantile function (inverse CDF)#
Let \(p \in (0,1)\). Solving \(p = (1 + x^{-c})^{-d}\) gives:
Location/scale#
SciPy uses the standard location/scale convention:
def burr_logpdf(x, c, d):
"""Log-PDF of the standard Burr Type III / Dagum distribution (loc=0, scale=1)."""
x = np.asarray(x, dtype=float)
out = np.full_like(x, -np.inf, dtype=float)
mask = x > 0
xm = x[mask]
logx = np.log(xm)
# log(1 + x^{-c}) = log(1 + exp(-c log x)) computed stably
log1p_xnegc = np.logaddexp(0.0, -c * logx)
out[mask] = (
np.log(c)
+ np.log(d)
+ (-c - 1.0) * logx
- (d + 1.0) * log1p_xnegc
)
return out
def burr_pdf(x, c, d):
return np.exp(burr_logpdf(x, c, d))
def burr_logcdf(x, c, d):
"""Log-CDF of the standard Burr Type III / Dagum distribution."""
x = np.asarray(x, dtype=float)
out = np.full_like(x, -np.inf, dtype=float)
mask = x > 0
xm = x[mask]
logx = np.log(xm)
log1p_xnegc = np.logaddexp(0.0, -c * logx)
out[mask] = -d * log1p_xnegc
return out
def burr_cdf(x, c, d):
x = np.asarray(x, dtype=float)
out = np.zeros_like(x, dtype=float)
mask = x > 0
out[mask] = np.exp(burr_logcdf(x[mask], c, d))
return out
def burr_ppf(p, c, d):
"""Quantile function Q(p) for p in [0,1]."""
p = np.asarray(p, dtype=float)
x = np.full_like(p, np.nan, dtype=float)
x[p == 0] = 0.0
x[p == 1] = np.inf
mask = (p > 0) & (p < 1)
# p^{-1/d} - 1 = exp(-log p / d) - 1; expm1 is stable when p ~ 1.
t = np.expm1(-np.log(p[mask]) / d)
x[mask] = np.power(t, -1.0 / c)
return x
def burr_rvs_numpy(c, d, size, rng=None):
"""NumPy-only sampler via inverse-transform sampling."""
rng = np.random.default_rng() if rng is None else rng
u = rng.random(size)
return burr_ppf(u, c, d)
def burr_raw_moment(k, c, d):
"""Raw moment E[X^k] for k < c; returns +inf when the moment diverges."""
if k >= c:
return np.inf
return np.exp(gammaln(1.0 - k / c) + gammaln(d + k / c) - gammaln(d))
def burr_entropy(c, d):
"""Differential entropy of the standard Burr Type III / Dagum distribution."""
return -np.log(c * d) - (1.0 + 1.0 / c) * (psi(1.0) - psi(d)) + 1.0 + 1.0 / d
def burr_summary_stats(c, d):
"""Mean/variance/skewness/excess kurtosis (when finite), else nan/inf."""
mean = burr_raw_moment(1.0, c, d) if c > 1 else np.inf
if c <= 2:
return mean, np.inf, np.nan, np.nan
m2 = burr_raw_moment(2.0, c, d)
var = m2 - mean**2
skew = np.nan
exkurt = np.nan
if c > 3:
m3 = burr_raw_moment(3.0, c, d)
mu3 = m3 - 3 * m2 * mean + 2 * mean**3
skew = mu3 / (var ** 1.5)
if c > 4:
m3 = burr_raw_moment(3.0, c, d) # defined since c>4
m4 = burr_raw_moment(4.0, c, d)
mu4 = m4 - 4 * m3 * mean + 6 * m2 * mean**2 - 3 * mean**4
exkurt = mu4 / (var**2) - 3.0
return mean, var, skew, exkurt
4) Moments & Properties#
Existence of moments (key takeaway)#
The right tail behaves like a power law:
So the \(k\)-th moment exists iff \(k < c\) (independent of \(d\)).
Raw moments#
For \(k < c\):
Mean and variance#
Mean (exists for \(c>1\)): $\(\mathbb{E}[X] = \frac{\Gamma\left(1 - \tfrac{1}{c}\right)\,\Gamma\left(d + \tfrac{1}{c}\right)}{\Gamma(d)}\)$
Second moment exists for \(c>2\): $\(\mathbb{E}[X^2] = \frac{\Gamma\left(1 - \tfrac{2}{c}\right)\,\Gamma\left(d + \tfrac{2}{c}\right)}{\Gamma(d)}\)$
Variance (exists for \(c>2\)): $\(\mathrm{Var}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2\)$
Skewness and kurtosis#
Skewness exists for \(c>3\)
Excess kurtosis exists for \(c>4\)
You can compute them from raw moments \(m_k = \mathbb{E}[X^k]\) via standard formulas.
MGF / characteristic function#
The MGF \(M(t)=\mathbb{E}[e^{tX}]\) diverges for any \(t>0\) because the tail is polynomial.
The characteristic function \(\varphi(t)=\mathbb{E}[e^{itX}]\) exists for all real \(t\), but does not have a simple elementary closed form (it can be expressed using special functions and evaluated numerically).
Entropy#
The (differential) entropy has a closed form:
where \(\psi\) is the digamma function.
c0, d0 = 3.5, 2.0
x_test = np.array([0.2, 0.5, 1.0, 2.0, 5.0])
pdf_np = burr_pdf(x_test, c0, d0)
pdf_sp = stats.burr.pdf(x_test, c0, d0)
print("max |pdf_numpy - pdf_scipy|:", np.max(np.abs(pdf_np - pdf_sp)))
mean, var, skew, exkurt = burr_summary_stats(c0, d0)
mean_sp, var_sp, skew_sp, exkurt_sp = stats.burr.stats(c0, d0, moments="mvsk")
print("mean:", mean, "(scipy:", float(mean_sp), ")")
print("var:", var, "(scipy:", float(var_sp), ")")
print("skew:", skew, "(scipy:", float(skew_sp), ")")
print("excess kurtosis:", exkurt, "(scipy:", float(exkurt_sp), ")")
h = burr_entropy(c0, d0)
print("entropy:", h, "(scipy:", float(stats.burr.entropy(c0, d0)), ")")
print("\nMoment existence demo (moment finite iff k < c):")
for k in [0.5, 1.0, 2.0, 3.0, 4.0]:
m = burr_raw_moment(k, c0, d0)
print(f"E[X^{k}] = {m}")
max |pdf_numpy - pdf_scipy|: 3.3306690738754696e-16
mean: 1.4760910375888234 (scipy: 1.4760910375888239 )
var: 0.7147250598130057 (scipy: 0.7147250598130044 )
skew: 8.514406493629222 (scipy: 8.514406493629242 )
excess kurtosis: nan (scipy: nan )
entropy: 0.8398041366589724 (scipy: 0.8398041366582019 )
Moment existence demo (moment finite iff k < c):
E[X^0.5] = 1.1821440631620528
E[X^1.0] = 1.4760910375888234
E[X^2.0] = 2.893569811063055
E[X^3.0] = 11.525904615830015
E[X^4.0] = inf
5) Parameter Interpretation#
The parameters \(c\) and \(d\) both affect shape, but in different ways.
\(c\) (tail index)#
Controls the right-tail exponent: \(\Pr(X>x) \sim d\,x^{-c}\).
Larger \(c\) means a lighter tail and more finite moments.
mean exists for \(c>1\)
variance exists for \(c>2\)
\(d\) (body / lower-tail behavior)#
Controls the body and behavior near 0.
As \(x\to 0^+\), \(F(x) \approx x^{c d}\) and \(f(x) \approx c d\,x^{c d - 1}\).
If \(c d \ge 1\), the PDF is finite at 0.
If \(c d < 1\), the PDF diverges at 0.
A useful mental model comes from the transformation \(T = X^{-c}\):
\(T\) follows a Lomax distribution with shape \(d\).
Then \(X = T^{-1/c}\) stretches/compresses that Lomax behavior on a log scale.
x = np.logspace(-3, 3, 800)
# Effect of changing c (tail index)
d_fixed = 2.0
c_list = [1.2, 2.0, 5.0]
fig_pdf_c = go.Figure()
for c in c_list:
y = np.maximum(burr_pdf(x, c, d_fixed), 1e-300)
fig_pdf_c.add_trace(
go.Scatter(x=x, y=y, mode="lines", name=f"c={c}, d={d_fixed}")
)
fig_pdf_c.update_layout(title="PDF shape when varying c (d fixed)")
fig_pdf_c.update_xaxes(type="log", title="x")
fig_pdf_c.update_yaxes(type="log", title="pdf(x)")
fig_pdf_c.show()
# Effect of changing d (body / lower-tail behavior)
c_fixed = 3.0
d_list = [0.5, 1.0, 3.0]
fig_pdf_d = go.Figure()
for d in d_list:
y = np.maximum(burr_pdf(x, c_fixed, d), 1e-300)
fig_pdf_d.add_trace(
go.Scatter(x=x, y=y, mode="lines", name=f"c={c_fixed}, d={d}")
)
fig_pdf_d.update_layout(title="PDF shape when varying d (c fixed)")
fig_pdf_d.update_xaxes(type="log", title="x")
fig_pdf_d.update_yaxes(type="log", title="pdf(x)")
fig_pdf_d.show()
# CDF view (often easier to interpret)
fig_cdf = go.Figure()
for c in c_list:
fig_cdf.add_trace(
go.Scatter(x=x, y=burr_cdf(x, c, d_fixed), mode="lines", name=f"c={c}, d={d_fixed}")
)
fig_cdf.update_layout(title="CDF when varying c (d fixed)")
fig_cdf.update_xaxes(type="log", title="x")
fig_cdf.update_yaxes(title="cdf(x)")
fig_cdf.show()
6) Derivations#
A clean way to derive many results is to transform \(X\) into a Lomax random variable.
Step 1: show \(T = X^{-c}\) is Lomax#
Let \(T = X^{-c}\). For \(t>0\):
But
So
which is exactly the Lomax/Pareto-II CDF with shape \(d\) and unit scale.
Step 2: derive \(\mathbb{E}[X^k]\)#
Since \(X = T^{-1/c}\),
With Lomax density \(g(t) = d (1+t)^{-d-1}\) for \(t>0\),
This is a Beta-function integral (valid when \(-1 < -k/c\), i.e. \(k<c\)):
with \(a = -k/c\). Substituting and simplifying yields
Variance#
Plug \(k=1\) and \(k=2\) into the raw-moment formula and use
(finite only when \(c>2\)).
Likelihood#
Given i.i.d. data \(x_1,\dots,x_n\) (all \(>0\)) in the standard form, the log-likelihood is:
There is no closed-form MLE for \((c,d)\) in general; numerical optimization is used in practice (see scipy.stats.burr.fit).
def burr_loglik(c, d, x):
x = np.asarray(x, dtype=float)
if (c <= 0) or (d <= 0) or np.any(x <= 0):
return -np.inf
return float(np.sum(burr_logpdf(x, c, d)))
# quick sanity check: log-likelihood is higher near the true parameters (on average)
c_true, d_true = 3.0, 2.0
x_data = burr_rvs_numpy(c_true, d_true, size=2000, rng=rng)
for (c_try, d_try) in [(2.0, 2.0), (3.0, 2.0), (4.0, 2.0), (3.0, 1.0), (3.0, 3.0)]:
print((c_try, d_try), burr_loglik(c_try, d_try, x_data))
(2.0, 2.0) -2336.3535391062123
(3.0, 2.0) -2114.319013644667
(4.0, 2.0) -2273.9298182857347
(3.0, 1.0) -2514.0593433068734
(3.0, 3.0) -2289.942828886021
7) Sampling & Simulation (NumPy-only)#
The standard Burr III CDF is
Using inverse-transform sampling:
Draw \(U \sim \mathrm{Uniform}(0,1)\)
Set \(U = (1 + X^{-c})^{-d}\)
Solve: $\(U^{-1/d} = 1 + X^{-c} \Rightarrow X = (U^{-1/d} - 1)^{-1/c}\)$
This gives a fast sampler without any rejection steps.
c_samp, d_samp = 3.0, 2.0
samples = burr_rvs_numpy(c_samp, d_samp, size=50_000, rng=rng)
qs = np.array([0.1, 0.5, 0.9, 0.99])
q_emp = np.quantile(samples, qs)
q_theory = burr_ppf(qs, c_samp, d_samp)
print("Quantiles p:", qs)
print("Empirical:", q_emp)
print("Theory:", q_theory)
print("\nSample mean/var (finite here since c=3>2):")
print("mean:", samples.mean(), "(theory:", burr_raw_moment(1, c_samp, d_samp), ")")
print("var:", samples.var(), "(theory:", burr_raw_moment(2, c_samp, d_samp) - burr_raw_moment(1, c_samp, d_samp) ** 2, ")")
Quantiles p: [0.1 0.5 0.9 0.99]
Empirical: [0.775767 1.346368 2.6389 5.900616]
Theory: [0.773326 1.341504 2.644159 5.833366]
Sample mean/var (finite here since c=3>2):
mean: 1.6135746770341746 (theory: 1.612266101541527 )
var: 1.4486312796347882 (theory: 1.4312632716739029 )
8) Visualization#
We’ll visualize:
the theoretical PDF and CDF
a Monte Carlo sample (histogram + empirical CDF)
c_vis, d_vis = 3.0, 2.0
x_grid = np.logspace(-3, 3, 800)
# PDF
fig_pdf = go.Figure()
fig_pdf.add_trace(
go.Scatter(x=x_grid, y=np.maximum(burr_pdf(x_grid, c_vis, d_vis), 1e-300), mode="lines", name="pdf")
)
fig_pdf.update_layout(title=f"Burr III PDF (c={c_vis}, d={d_vis})")
fig_pdf.update_xaxes(type="log", title="x")
fig_pdf.update_yaxes(type="log", title="pdf(x)")
fig_pdf.show()
# CDF
fig_cdf2 = go.Figure()
fig_cdf2.add_trace(go.Scatter(x=x_grid, y=burr_cdf(x_grid, c_vis, d_vis), mode="lines", name="cdf"))
fig_cdf2.update_layout(title=f"Burr III CDF (c={c_vis}, d={d_vis})")
fig_cdf2.update_xaxes(type="log", title="x")
fig_cdf2.update_yaxes(title="cdf(x)")
fig_cdf2.show()
# Monte Carlo samples: histogram + PDF overlay
samples_vis = burr_rvs_numpy(c_vis, d_vis, size=30_000, rng=rng)
fig_hist = px.histogram(
samples_vis,
nbins=80,
histnorm="probability density",
log_x=True,
opacity=0.55,
title=f"Monte Carlo histogram vs PDF (c={c_vis}, d={d_vis})",
)
fig_hist.add_trace(
go.Scatter(x=x_grid, y=burr_pdf(x_grid, c_vis, d_vis), mode="lines", name="pdf")
)
fig_hist.update_xaxes(title="x")
fig_hist.update_yaxes(title="density")
fig_hist.show()
# Empirical CDF vs theoretical CDF
x_sorted = np.sort(samples_vis)
ecdf = np.arange(1, len(x_sorted) + 1) / len(x_sorted)
fig_ecdf = go.Figure()
fig_ecdf.add_trace(go.Scatter(x=x_sorted, y=ecdf, mode="lines", name="empirical CDF"))
fig_ecdf.add_trace(go.Scatter(x=x_grid, y=burr_cdf(x_grid, c_vis, d_vis), mode="lines", name="theoretical CDF"))
fig_ecdf.update_layout(title="Empirical vs theoretical CDF")
fig_ecdf.update_xaxes(type="log", title="x")
fig_ecdf.update_yaxes(title="CDF")
fig_ecdf.show()
9) SciPy Integration (scipy.stats.burr)#
SciPy provides a ready-to-use implementation:
stats.burr.pdf(x, c, d, loc=0, scale=1)stats.burr.cdf(x, c, d, loc=0, scale=1)stats.burr.rvs(c, d, loc=0, scale=1, size=..., random_state=...)stats.burr.fit(data, ...)(MLE)
Reminder: SciPy’s burr is Burr Type III / Dagum. SciPy’s burr12 is Burr Type XII.
c_true, d_true = 3.0, 2.0
dist = stats.burr(c_true, d_true) # loc=0, scale=1 by default
x_eval = np.array([0.5, 1.0, 2.0, 5.0])
print("pdf:", dist.pdf(x_eval))
print("cdf:", dist.cdf(x_eval))
# rvs
data = dist.rvs(size=3000, random_state=rng)
print("sample min/max:", data.min(), data.max())
# fit (fix loc=0, scale=1 to estimate only c and d)
c_hat, d_hat, loc_hat, scale_hat = stats.burr.fit(data, floc=0, fscale=1)
print("\nTrue (c,d):", (c_true, d_true))
print("Fit (c,d):", (c_hat, d_hat))
print("Returned loc/scale:", (loc_hat, scale_hat))
# Compare numpy vs SciPy implementations numerically
x_dense = np.logspace(-3, 3, 1000)
max_pdf_diff = np.max(np.abs(burr_pdf(x_dense, c_true, d_true) - dist.pdf(x_dense)))
max_cdf_diff = np.max(np.abs(burr_cdf(x_dense, c_true, d_true) - dist.cdf(x_dense)))
print("\nmax |pdf_numpy - pdf_scipy|:", max_pdf_diff)
print("max |cdf_numpy - cdf_scipy|:", max_cdf_diff)
pdf: [0.131687 0.75 0.263374 0.009373]
cdf: [0.012346 0.25 0.790123 0.98419 ]
sample min/max: 0.21869142304750552 14.36061158934588
True (c,d): (3.0, 2.0)
Fit (c,d): (3.0007293936222674, 1.9803751910399106)
Returned loc/scale: (0, 1)
max |pdf_numpy - pdf_scipy|: 5.551115123125783e-16
max |cdf_numpy - cdf_scipy|: 2.220446049250313e-16
10) Statistical Use Cases#
Hypothesis testing#
Nested model test: the case \(d=1\) is log-logistic. You can test \(H_0: d=1\) vs \(H_1: d \ne 1\) using a likelihood-ratio test (LRT).
Goodness-of-fit: QQ-plots or distribution tests (KS/AD) can be used as diagnostics. Be careful: classical p-values assume parameters are known, while in practice they’re often estimated.
Bayesian modeling#
When modeling positive heavy-tailed data, you can use Burr III as a likelihood and place priors on \(c\) and \(d\) (e.g., log-normal or log-uniform priors). There is no conjugate prior, but posterior inference is straightforward with MCMC or even a simple grid approximation in low dimensions.
Generative modeling#
Useful as a base distribution for positive heavy-tailed generative models.
Can be used in mixtures (mixture of Burrs) to model multimodal positive data.
Reciprocal relationship to Burr XII can be exploited depending on whether your domain is naturally modeled by \(X\) or \(1/X\).
# Likelihood-ratio test example: H0: d=1 (log-logistic) vs H1: d free
c0, d0 = 2.5, 1.0
x = stats.burr.rvs(c0, d0, size=1500, random_state=rng)
# Fit under H1 (free c, d) and H0 (d fixed to 1). Fix loc=0, scale=1 for simplicity.
c_hat1, d_hat1, _, _ = stats.burr.fit(x, floc=0, fscale=1)
c_hat0, d_hat0, _, _ = stats.burr.fit(x, f1=1.0, floc=0, fscale=1) # d fixed
ll1 = np.sum(stats.burr.logpdf(x, c_hat1, d_hat1))
ll0 = np.sum(stats.burr.logpdf(x, c_hat0, 1.0))
lrt_stat = 2 * (ll1 - ll0)
p_value = stats.chi2.sf(lrt_stat, df=1)
print("True params:", (c0, d0))
print("Fit H1 (c,d):", (c_hat1, d_hat1))
print("Fit H0 (c,d=1):", (c_hat0, 1.0))
print("LRT stat:", float(lrt_stat))
print("Approx p-value (chi^2_1):", float(p_value))
True params: (2.5, 1.0)
Fit H1 (c,d): (2.547213421880941, 1.025130683635059)
Fit H0 (c,d=1): (2.5686523437500033, 1.0)
LRT stat: 0.7672483125884355
Approx p-value (chi^2_1): 0.3810696572372613
# Simple Bayesian grid posterior over (c,d) with a log-uniform prior p(c,d) ∝ 1/(c d)
# This is an approximation for intuition (not a replacement for MCMC for serious work).
c_true, d_true = 3.0, 2.0
data = stats.burr.rvs(c_true, d_true, size=400, random_state=rng)
logx = np.log(data)
sum_logx = logx.sum()
n = data.size
c_grid = np.linspace(1.1, 6.0, 90) # avoid c<=1 where mean diverges
d_grid = np.linspace(0.2, 6.0, 90)
log_post = np.empty((c_grid.size, d_grid.size), dtype=float)
for i, c in enumerate(c_grid):
# sum_i log(1 + x_i^{-c}) computed stably
s = np.logaddexp(0.0, -c * logx).sum()
# log-likelihood for each d (vectorized)
loglike = n * np.log(c) + n * np.log(d_grid) + (-c - 1.0) * sum_logx - (d_grid + 1.0) * s
# log-uniform prior on (c,d) over the grid bounds
logprior = -np.log(c) - np.log(d_grid)
log_post[i, :] = loglike + logprior
# Normalize on the discrete grid (treating cells as equal-area for visualization)
log_post -= logsumexp(log_post)
post = np.exp(log_post)
i_map, j_map = np.unravel_index(np.argmax(post), post.shape)
print("True (c,d):", (c_true, d_true))
print("MAP (c,d):", (float(c_grid[i_map]), float(d_grid[j_map])))
fig_post = go.Figure(
data=go.Contour(
x=d_grid,
y=c_grid,
z=post,
contours_coloring="heatmap",
colorbar_title="posterior",
)
)
fig_post.update_layout(title="Grid posterior p(c,d | data) with log-uniform prior")
fig_post.update_xaxes(title="d")
fig_post.update_yaxes(title="c")
fig_post.show()
True (c,d): (3.0, 2.0)
MAP (c,d): (2.971910112359551, 1.8943820224719101)
11) Pitfalls#
Invalid parameters: require \(c>0\) and \(d>0\). With location/scale, also require \(\mathrm{scale}>0\).
Support mismatch: data must satisfy \(x>0\) in the standard form, and \(x>\mathrm{loc}\) with a location shift.
Non-existent moments: the \(k\)-th moment exists iff \(k<c\).
If \(c\le 1\), the mean diverges.
If \(c\le 2\), the variance diverges.
Near-zero behavior: if \(c d < 1\), the PDF diverges at 0 (this can be fine mathematically, but it affects numerics and interpretation).
Numerical issues: direct computation of \(x^{-c}\) can overflow for tiny \(x\). Prefer log-domain computations (as in
burr_logpdf).Fitting:
MLE can be sensitive to starting points and to whether
loc/scaleare also being fit.Use
logpdffor likelihood work to avoid underflow.
Name confusion: Burr has multiple “types”. SciPy
burris Type III (Dagum), while SciPyburr12is Type XII.
12) Summary#
burr(Burr Type III / Dagum) is a flexible positive, heavy-tailed continuous distribution.The CDF is \(F(x)=(1+x^{-c})^{-d}\) and the PDF is \(f(x)=cd\,x^{-c-1}/(1+x^{-c})^{d+1}\).
Tail heaviness is controlled by \(c\): moments exist iff \(k<c\).
The transform \(T=X^{-c}\) yields a Lomax distribution, making derivations and sampling straightforward.
Sampling is easy via inverse CDF: \(X=(U^{-1/d}-1)^{-1/c}\).
SciPy integration is direct via
scipy.stats.burr(andburr12for Burr Type XII).